Small-space encoding LCE data structure with constant-time queries

نویسندگان

  • Yuka Tanimura
  • Takaaki Nishimoto
  • Hideo Bannai
  • Shunsuke Inenaga
  • Masayuki Takeda
چکیده

The longest common extension (LCE) problem is to preprocess a given string w of length n so that the length of the longest common prefix between suffixes of w that start at any two given positions is answered quickly. In this paper, we present a data structure of O(zτ + n τ ) words of space which answers LCE queries in O(1) time and can be built in O(n log σ) time, where 1 ≤ τ ≤ √ n is a parameter, z is the size of the Lempel-Ziv 77 factorization of w and σ is the alphabet size. This is an encoding data structure, i.e., it does not access the input string w when answering queries and thus w can be deleted after preprocessing. On top of this main result, we obtain further results using (variants of) our LCE data structure, which include the following: • For highly repetitive strings where the zτ term is dominated by n τ , we obtain a constant-time and sub-linear space LCE query data structure. • Even when the input string is not well compressible via Lempel-Ziv 77 factorization, we still can obtain a constant-time and sub-linear space LCE data structure for suitable τ and for σ ≤ 2. • The time-space trade-off lower bounds for the LCE problem by Bille et al. [J. Discrete Algorithms, 25:42-50, 2014] and by Kosolobov [CoRR, abs/1611.02891, 2016] can be “surpassed” in some cases with our LCE data structure.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Longest Common Extensions in Trees

The longest common extension (LCE) of two indices in a string is the length of the longest identical substrings starting at these two indices. The LCE problem asks to preprocess a string into a compact data structure that supports fast LCE queries. In this paper we generalize the LCE problem to trees and suggest a few applications of LCE in trees to tries and XML databases. Given a labeled and ...

متن کامل

Deterministic Sub-Linear Space LCE Data Structures With Efficient Construction

Given a string S of n symbols, a longest common extension query LCE(i, j) asks for the length of the longest common prefix of the ith and jth suffixes of S. LCE queries have several important applications in string processing, perhaps most notably to suffix sorting. Recently, Bille et al. (J. Discrete Algorithms 25:42–50, 2014, Proc. CPM 2015:65–76) described several data structures for answeri...

متن کامل

Time-Space Trade-Offs for Longest Common Extensions

We revisit the longest common extension (LCE) problem, that is, preprocess a string T into a compact data structure that supports fast LCE queries. An LCE query takes a pair (i, j) of indices in T and returns the length of the longest common prefix of the suffixes of T starting at positions i and j. We study the time-space trade-offs for the problem, that is, the space used for the data structu...

متن کامل

Fast Longest Common Extensions in Small Space

In this paper we address the longest common extension (LCE) problem: to compute the length l of the longest common prefix between any two suffixes of T ∈ Σ with Σ = {0, . . . σ − 1}. We present two fast and spaceefficient solutions based on (Karp-Rabin) fingerprinting and sampling. Our first data structure exploits properties of Mersenne prime numbers when used as moduli of the Karp-Rabin hash ...

متن کامل

Fully Dynamic Data Structure for LCE Queries in Compressed Space

A Longest Common Extension (LCE) query on a text T of length N asks for the length of the longest common prefix of suffixes starting at given two positions. We show that the signature encoding G of size w = O(min(z log N log∗M, N)) [Mehlhorn et al., Algorithmica 17(2):183198, 1997] of T , which can be seen as a compressed representation of T , has a capability to support LCE queries in O(log N ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1702.07458  شماره 

صفحات  -

تاریخ انتشار 2017